19 research outputs found

    Misogyny Detection in Social Media on the Twitter Platform

    Get PDF
    The thesis is devoted to the problem of misogyny detection in social media. In the work we analyse the difference between all offensive language and misogyny language in social media, and review the best existing approaches to detect offensive and misogynistic language, which are based on classical machine learning and neural networks. We also review recent shared tasks aimed to detect misogyny in social media, several of which we have participated in. We propose an approach to the detection and classification of misogyny in texts, based on the construction of an ensemble of models of classical machine learning: Logistic Regression, Naive Bayes, Support Vectors Machines. Also, at the preprocessing stage we used some linguistic features, and novel approaches which allow us to improve the quality of classification. We tested the model on the real datasets both English and multilingual corpora. The results we achieved with our model are highly competitive in this area and demonstrate the capability for future improvement

    Classifying Misogynistic Tweets Using a Blended Model: the AMI Shared Task in IBEREVAL 2018

    Get PDF
    This article describes a possible solution for Automatic Misogyny Identification (AMI) Shared Task at IBEREVAL-2018. The proposed technique is based on combining several simpler classifiers into one more complex blended model, which classified the data taking into account the probabilities of belonging to classes calculated by simpler models. We used the Logistic Regression, Naive Bayes, and SVM classifiers. The experimental results show that blended model works better than simpler models for all three type of classification, for both binomial classification (Misogyny Identifivation, Target Classification) and multinomial classification (Misogynistic Behavior)

    Misogyny Detection and Classification in English Tweets: The Experience of the ITT Team

    Get PDF
    The problem of online misogyny and women-based offending has become increasingly widespread, and the automatic detection of such messages is an urgent priority. In this paper, we present an approach based on an ensemble of Logistic Regression, Support Vector Machines, and Naïve Bayes models for the detection of misogyny in texts extracted from the Twitter platform. Our method has been presented in the framework of the participation in the Automatic Misogyny Identification (AMI) Shared Task in the EVALITA 2018 evaluation campaign

    Automatic Misogyny Detection in Social Media: a Survey

    Get PDF
    This article presents a survey of automated misogyny identification techniques in social media, especially in Twitter. This problem is urgent because of the high speed at which messages on social platforms grow and the widespread use of offensive language (including misogynistic language) in them. In this article we survey approaches proposed in the literature to solve the problem of misogynistic message recognition. These include classical machine learning models like Sup-port Vector Machine, Naive Bayes, Logistic Regression and ensembles of different classical machine learning models and deep neural networks such as Long Short-term memory and Convolutional Neural Networks. We consider results of experiments with these models in different languages: English, Spanish and Italian tweets. The survey describes some features which help to identify misogynistic tweets and some challenges which aim was to create misogyny language classifiers. The survey includes not only models which help to identify misogyny language, but also systems which help to recognize a target of an offense (an individual or a group of persons)

    Detection of Truthful, Semi-Truthful, False and Other News with Arbitrary Topics Using BERT-Based Models

    Get PDF
    Easy and uncontrolled access to the Internet provokes the wide propagation of false information, which freely circulates in the Internet. Researchers usually solve the problem of fake news detection (FND) in the framework of a known topic and binary classification. In this paper we study possibilities of BERT-based models to detect fake news in news flow with unknown topics and four categories: true, semi-true, false and other. The object of consideration is the dataset CheckThat! Lab proposed for the conference CLEF-2022. The subjects of consideration are the models SBERT, RoBERTa, and mBERT. To improve the quality of classification we use two methods: the addition of a known dataset (LIAR), and the combination of several classes (true + semi-true, false + semi-true). The results outperform the existing achievements, although the state-of-the-art in the FND area is still far from practical applications

    Offensive Language Recognition in Social Media

    Full text link
    [EN] This article proposes an approach to solving the problem of multiclassification within the framework of aggressive language recognition in Twitter. At the stage of preprocessing external data is added to the existing dataset, which is based on information in the links in dataset. This made it possible to expand the training dataset and thereby to improve the quality of the classification. The model created is an ensemble of classical machine learning models included Logistic Regression, Support Vector Machines, Naive Bayes models and a combination of Logistic Regression and Naive Bayes. The obtained value of macro F1-score for one of the experiments achieved 0.61, which exceeds the state-of-art published value by 1 percentage point. This indicates the potential value of the proposed approach in the field of hate speech recognition in social media.The work of Paolo Rosso was partially funded by the Spanish MICINN under the research project MISMISFAKEnHATE on Misinformation and Miscommunication in social media: FAKE news and HATE speech (PGC2018-096212-B-C31).Shushkevich, E.; Cardiff, J.; Rosso, P.; Akhtyamova, L. (2020). Offensive Language Recognition in Social Media. Computación y Sistemas. 24(2):523-532. https://doi.org/10.13053/CyS-24-2-3376S52353224

    SPICED: News Similarity Detection Dataset with Multiple Topics and Complexity Levels

    Full text link
    Nowadays, the use of intelligent systems to detect redundant information in news articles has become especially prevalent with the proliferation of news media outlets in order to enhance user experience. However, the heterogeneous nature of news can lead to spurious findings in these systems: Simple heuristics such as whether a pair of news are both about politics can provide strong but deceptive downstream performance. Segmenting news similarity datasets into topics improves the training of these models by forcing them to learn how to distinguish salient characteristics under more narrow domains. However, this requires the existence of topic-specific datasets, which are currently lacking. In this article, we propose a new dataset of similar news, SPICED, which includes seven topics: Crime & Law, Culture & Entertainment, Disasters & Accidents, Economy & Business, Politics & Conflicts, Science & Technology, and Sports. Futhermore, we present four distinct approaches for generating news pairs, which are used in the creation of datasets specifically designed for news similarity detection task. We benchmarked the created datasets using MinHash, BERT, SBERT, and SimCSE models

    Classification of Schoolchildren on Professional Trajectories using Experience of Successful Specialists

    Get PDF
    In the paper, we propose a new approach to vocational guidance of schoolchildren based on classification of pupil wishes between given professional trajectories, which are presented by profiles of successful professionals. Both wishes and profiles are replies in free text form on a questionnaire proposed by skilled psychologists. Such an approach avoids the well-known deficiencies of traditional methods including binary questioning, talks about concrete professions, and interviews with school psychologists. We use the simple terms selection for preprocessing and the traditional method of voting for classification. The mentioned procedures are discussed and the proposed approach is preliminary checked on invited specialists. This joint Russian-Irish research has been carried out with Moscow schoolchildren (2 schools) and Moscow specialists (2 trajectories). The results of presented pilot study look very promising. It is the basis for current applied research in Moscow and the future activities in Dublin

    ATLAS Run 1 searches for direct pair production of third-generation squarks at the Large Hadron Collider

    Get PDF

    Measurement of the charge asymmetry in top-quark pair production in the lepton-plus-jets final state in pp collision data at s=8TeV\sqrt{s}=8\,\mathrm TeV{} with the ATLAS detector

    Get PDF
    corecore